-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New function offline_run!
to write data during run!
#815
Conversation
Codecov Report
@@ Coverage Diff @@
## main #815 +/- ##
==========================================
+ Coverage 69.76% 70.02% +0.26%
==========================================
Files 41 42 +1
Lines 2636 2706 +70
==========================================
+ Hits 1839 1895 +56
- Misses 797 811 +14
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is CSV the best thing to use here? Aren't there any other binary-based data formats that can be used instead, and loaded directly into a dataframe...?
Is the problem with CSV "only" that it is not binary, or something else? |
I am surprised that DataFrames.jl does not provide a binary file format that can also be read and written to line-by-line. Seems like the data science people would have figured this out by now. |
Yes the problem with CSV is "only" that it is non-binary, but this is a rather big problem. If you are using this functionality, you probably have a lot of data, and to my knowledge text-based ways to save numerical data are the most inefficient in terms of memory. So the disk space you will fill with this will be very large. |
Haven't taken time to look for an alternate solution to CSV yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We fully agree on the logic. DOn't forget to:
- increment minor version in project.toml
- add changelog entry udner new minor version
- add the new function in the documentation (API page). I guess it should go under the Save. Load, Checpoints section
- Mention the function in
run!
by saying "see also[offline_run!](@ref)
It would be great if we could have a more disk-efficient backend to save. But CSV files are nice as well and some users may prefer them for other reasons. I propose to al;ready anticipate the possibility of different backends and provide a keyword argument that configures what backend should be used.
Would something like this work, at the beginning of
Would this be improved by defining a "maker" function?
and then inside of |
creating a maker function is indeed a better idea. However you should also introduce a function barrier: after you initialize all data frames and the maker function, pass all of these into another function. This establishes type stability where it matters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be merged unless conflicts are resolved (rebase wiht main
). Please don't forget teh function barrier I am sure it will have a positive impact on performance.
Arrow backend is implemented now. In the process I've condensed the functions a bit. Also added tests for Arrow functionality and they passed locally on my machine. The workflow for adding a new
|
Can anyone test this on Windows? I don't have a Windows machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this change doesn't make any sense. When the file doesn't exist or when we want to overwrite an existing file (i.e. !append
), then we need to pass file = false
to the write
function. Please revert it.
yes, you are right, it was a last, really desperate, attempt to solve the problem, don't know why but my Windows didn't complain locally for this :D, anyway I tried to dig up a bit for sometime and I still have no clue why Windows denies the possibility to remove the file |
@Tortar Did you run the tests locally on your windows machine and they all passed? If so, then we can be sure that this is a CI windows problem. |
no I didn't explain myself well, they passed when I changed the lines with the non sensical approach :D (but this is probably because I also changed something else), they don't pass locally on my Windows machine, the file |
Ah sorry, misunderstood you then. Hm, really weird that it only fails on Windows but has no problems on Linux and MacOS. No idea how to approach this myself, as well, sorry. :/ |
Sorry for being away but I am overwhelmed with other projects and won't have much time for Agents.jl for the next semester or so... I'll be giving comments if you explicitly as for them by tagging me @Tortar or @fbanning but will have notifications off for the next semester :( Now, to finish this: if I understand correclty the status quo is that this works fine, but doesn't work on windows, yes? If so, can we then add an error to the first line of the function that checks the OS and prints and error, and open an issue that this function doesn't seem to work as expect in windows with Arrow? In the test suite we only run the test if OS is not windows. Then we can merge? Because everything else seems fine. |
Sure thing. I'll add some clauses for that.
I'll do that right after pushing the required OS handling changes and after the tests finally pass. |
Replaces #630